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Abstract 



Multi-electrode neurophysiological recordings produce massive quantities of data. 
Multivariate time series analysis provides the basic framework for analyzing the pat- 
terns of neural interactions in these data. It has long been recognized that neural 
interactions are directional. Being able to assess the directionality of neuronal inter- 
actions is thus a highly desired capability for understanding the cooperative nature 
of neural computation. Research over the last few years has shown that Granger 
causality is a key technique to furnish this capability. The main goal of this article 
is to provide an expository introduction to the concept of Granger causality. Math- 
ematical frameworks for both bivariate Granger causality and conditional Granger 
causality are developed in detail with particular emphasis on their spectral repre- 
sentations. The technique is demonstrated in numerical examples where the exact 
answers of causal influences are known. It is then applied to analyze multichannel 
local field potentials recorded from monkeys performing a visuomotor task. Our 
results are shown to be physiologically interpretable and yield new insights into the 
dynamical organization of large-scale oscillatory cortical networks. 
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1 Introduction 



In neuroscience, as in many other fields of science and engineering, signals of 
interest are often collected in the form of multiple simultaneous time series. 
To evaluate the statistical interdependence among these signals, one calcu- 
lates cross correlation functions in the time domain and ordinary coherence 
functions in the spectral domain. However, in many situations of interest, 
symmetric 1 measures like ordinary coherence are not completely satisfactory, 
and further dissection of the interaction patterns among the recorded signals 
is required to parcel out effective functional connectivity in complex networks. 
Recent work has begun to consider the causal influence one neural time series 
can exert on another. The basic idea can be traced back to Wiener [1] who 
conceived the notion that, if the prediction of one time series could be im- 
proved by incorporating the knowledge of a second one, then the second series 
is said to have a causal influence on the first. Wiener's idea lacks the machin- 
ery for practical implementation. Granger later formalized the prediction idea 
in the context of linear regression models [2] . Specifically, if the variance of the 
autoregressive prediction error of the first time series at the present time is 
reduced by inclusion of past measurements from the second time series, then 
the second time series is said to have a causal influence on the first one. The 
roles of the two time series can be reversed to address the question of causal 
influence in the opposite direction. From this definition it is clear that the flow 
of time plays a vital role in allowing inferences to be made about directional 
causal influences from time series data. The interaction discovered in this way 
may be reciprocal or it may be unidirectional. 

Two additional developments of Granger's causality idea are important. First, 
for three or more simultaneous time series, the causal relation between any 
two of the series may be direct, may be mediated by a third one, or may be 
a combination of both. This situation can be addressed by the technique of 
conditional Granger causality. Second, natural time series, including ones from 
economics and neurobiology, contain oscillatory aspects in specific frequency 
bands. It is thus desirable to have a spectral representation of causal influence. 
Major progress in this direction has been made by Geweke [3,4] who found 
a novel time series decomposition technique that expresses the time domain 
Granger causality in terms of its frequency content. In this article we review 
the essential mathematical elements of Granger causality with special empha- 
sis on its spectral decomposition. We then discuss practical issues concerning 
how to estimate such measures from time series data. Simulations are used to 
illustrate the theoretical concepts. Finally, we apply the technique to analyze 
the dynamics of a large-scale sensorimotor network in the cerebral cortex dur- 



Here by symmetric we mean that, when A is coherent with B, B is equally coherent 
with A. 
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ing cognitive performance. Our result demonstrates that, for a well designed 
experiment, a carefully executed causality analysis can reveal insights that are 
not possible with other techniques. 



2 Bivariate Time Series and Pairwise Granger Causality 



Our exposition in this and the next section follows closely that of Geweke 
[3,4]. To avoid excessive mathematical complexity we develop the analysis 
framework for two time series. The framework can be generalized to two sets 
of time series [3]. 



2.1 Time Domain Formulation 



Consider two stochastic processes X t and Y t . Assume that they are jointly 
stationary. Individually, under fairly general conditions, each process admits 
an autoregressive representation 

oo 

x t = J2 a ^ X t-i + e i*> var(eit) = Si. (1) 

3=1 

oo 

Y t = E d U Y t-3 + Vu, varfau) = IY (2) 

3=1 

Jointly, they are represented as 

oo oo 

X t = E a 2J X t~j + E h 1i Y t-j + €2t, (3) 
3=1 3=1 

oo oo 

Yt = E c 23 x t-j + E hjYt-j + V2t, (4) 

3=1 3=1 

where the noise terms are uncorrelated over time and their contemporaneous 
covariance matrix is 



S = 



S2 T 2 

To r 9 



(5) 



The entries are defined as S 2 = var(e2t),r 2 = var(^ 2 t),T 2 = cov(e 2t , i]2t)- If 
X t and Y t are independent, then {b 2 j} and {c 2 j} are uniformly zero, T 2 = 0, 
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Si = S 2 and r\ = r 2 . This observation motivates the definition of total 
interdependence between X t and Y t as 

F x ,y = In 5^1 (6) 



where | • | denotes the determinant of the enclosed matrix. According to this 
definition, Fx,y = when the two time series are independent, and Fx,y > 
when they are not. 

Consider Eqs. (1) and (3). The value of Si measures the accuracy of the 
autoregressive prediction of X t based on its previous values, whereas the value 
of S 2 represents the accuracy of predicting the present value of X t based on 
the previous values of both X t and Y t . According to Wiener [1] and Granger 
[2], if S 2 is less than S x in some suitable statistical sense, then Y t is said to 
have a causal influence on X t . We quantify this causal influence by 

i^* = ln^. (7) 

^2 



It is clear that Fy-^x = when there is no causal influence from Y to X and 
Fy~>x > when there is. Similarly, one can define causal influence from X to 

Y as 

F^ Y = ln^. (8) 

1 2 



It is possible that the interdependence between X t and Y t cannot be fully 
explained by their interactions. The remaining interdependence is captured 
by T 2 , the covariance between e 2t and r] 2 t- This interdependence is referred to 
as instantaneous causality and is characterized by 

Fx - y = ln lip (9) 

When T 2 is zero, F x .y is also zero. When T 2 is not zero, F x .y > 0. 
The above definitions imply that 

F x ,y = F X ->y + F Y -+x + F X -y- (10) 



Thus we decompose the total interdependence between two time series X t 
and Y t into three components: two directional causal influences due to their 
interaction patterns, and the instantaneous causality due to factors possibly 
exogenous to the (X,Y) system (e.g. a common driving input). 
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2.2 Frequency Domain Formulation 



To begin we define the lag operator L to be LX t = X t _ ± . Rewrite Eqs. (3) 
and (4) in terms of the lag operator 



) 


<x t \ 


- 










life J 



where a 2 (0) = l,fo 2 (0) = 0,c 2 (0) = 0,d 2 (0) 
sides of Eq. (11) leads to 



(11) 

1. Fourier transforming both 











d 2 (u) J 









where the components of the coefficient matrix A (a;) are 



a 2 (u) = 1 - Yl a 2j e~^, b 2 (cu) = - ^ b 2j e-^ j , 
i=i j=i 



c 2 (u) = - ^ c 2j e tuJJ , d 2 (u) = 1-51 d 2j e ~ 

3=1 3=1 



■iuij 



Recasting Eq. (12) into the transfer function format we obtain 



(12) 



/ X ( UJ ) N 


1- 


' H xx (lo) 




' E x 










Hyy{u) j 


\ E y 





where the transfer function is H(u;) = A 1 (cu) whose components are 

1 



H xx {uj) 



-d 2 (u), H xy {u) 



-62 H, 



detA ' K n xyv ' detA 
1 1 

After proper ensemble averaging we have the spectral matrix 
S(w) = H(cj)SH*(cj) 



(13) 



(14) 



(15) 



where * denotes complex conjugate and matrix transpose. 
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The spectral matrix contains cross spectra and auto spectra. If X t and Y t are 
independent, then the cross spectra are zero and |S(c<;)| equals the product of 
two auto spectra. This observation motivates the spectral domain representa- 
tion of total interdependence between X t and Y t as 

fr,rM = ln |s( ^ , (16) 

where |S(u;)| = S xx (uj)S yy (uj) - S xy (uj)S yx (uj) and S yx (uj) = S* y (u). It is easy 
to see that this decomposition of interdependence is related to coherence by 
the following relation: 

/x,y(w) = -Ml-C(o/)), (17) 
where coherence is defined as 

o xx \0J)a yy [uj) 

The coherence defined in this way is sometimes referred to as the squared 
coherence. 

To obtain the frequency decomposition of the time domain causality defined 
in the previous section, we look at the auto spectrum of X t : 

S xx {u) = H xx {u)H 2 H* xx {u) + 2T 2 Re{H xx {u)H* xy {u)) + H xy {u)T 2 H* xy {u).{l$) 

It is instructive to consider the case where T 2 = 0. In this case there is no 
instantaneous causality and the interdependence between X t and Y t is entirely 
due to their interactions through the regression terms on the right hand sides 
of Eqs. (3) and (4). The spectrum has two terms. The first term, viewed as 
the intrinsic part, involves only the variance of e 2 t, which is the noise term 
that drives the X t time series. The second term, viewed as the causal part, 
involves only the variance of r] 2t) which is the noise term that drives Y t . This 
power decomposition into an "intrinsic" term and a "causal" term will become 
important for defining a measure for spectral domain causality. 

When T 2 is not zero it becomes harder to attribute the power of the X t series to 
different sources. Here we consider a transformation introduced by Geweke [3] 
that removes the cross term and makes the identification of an intrinsic power 
term and a causal power term possible. The procedure is called normalization 
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and it consists of left-multiplying 



1 

T 2 
1 

V s 2 J 



(19) 



on both sides of Eq. (12). The result is 



a 2 (cu) b 2 (uo) 



\c 3 (oj) d 3 (u) J 
where 03(00) = 02(0; 




(20) 



a 2 (u),d 3 (uj) = d 2 (u) 



■b 2 (uj),E y (uj) = E y (u) 



E x (uj). The new transfer function H(oj) for (20) is the inverse of the new 



coefficient matrix A(uj): 



HM 



H xx (uj) H xy (uj) 



d 3 (u) -b 2 (u) 



^H yx (uj) H yy {u) j detA \^-c 3 (c^) a 2 (u) 



(21) 



Since det A = det A we have 



T 2 

H xx (oj) = H xx (u) + —H xy (uj), H xy (uj) = H xy (oj), 

Hy X (u) = Hy X {uj) + — H XX (Uj), Hyy(0j) = Hyy(u). 

^2 



(22) 



From the construction it is easy to see that E x and E y are uncorrelated, that is, 
cov(E x , E y ) = 0. The variance of the noise term for the normalized Y t equation 
"f 2 

is T 2 = T 2 — — -. From Eq. (20), following the same steps that lead to Eq. 
S 2 

(18), the spectrum of X t is found to be: 

S xx (u) = H xx (u)^ 2 HI x (uj) + H xy (Lu)T 2 H* xy (Lu). (23) 



Here the first term is interpreted as the intrinsic power and the second term 
as the causal power of X t due to Y t . This is an important relation because it 
explicitly identifies that portion of the total power of X t at frequency u that is 
contributed by Y t . Based on this interpretation we define the causal influence 
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from Y t to X t at frequency lo as 

/y ^M=ln (24) 



Note that this definition of causal influence is expressed in terms of the intrinsic 
power rather than the causal power. It is expressed in this way so that the 
causal influence is zero when the causal power is zero (i.e., the intrinsic power 
equals the total power), and increases as the causal power increases (i.e., the 
intrinsic power decreases). 



( \ -T 2 /r 2 



By taking the transformation matrix as 

\° 1 

same analysis, we get the causal influence from X t to Y t : 



and performing the 



/^H=ln y> (25) 



T 2 

where H yy (u) = H yy (u) + —H yx (u). 

1 2 

By defining the spectral decomposition of instantaneous causality as [5] 

fx.rH - In ^ , (26) 



we achieve a spectral domain expression for the total interdependence that is 
analogous to Eq. (10) in the time domain, namely: 

fxA") = fxM") + fv^xiou) + f x . Y {u). (27) 



We caution that the spectral instantaneous causality may become negative for 
some frequencies in certain situations and may not have a readily interpretable 
physical meaning. 

It is important to note that, under general conditions, these spectral measures 
relate to the time domain measures as: 
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1 n 

— TT 
1 ^ 

— TT 

F y-x = ^J f Y .x(w)du. (28) 



The existence of these equalities gives credence to the spectral decomposition 
procedures described above. 



3 Trivariate Time Series and Conditional Granger Causality 



For three or more time series one can perform a pairwise analysis and thus 
reduce the problem to a bivariate problem. This approach has some inherent 
limitations. For example, for the two coupling schemes in Figure 1, a pairwise 
analysis will give the same patterns of connectivity like that in Figure 1(b). 
Another example involves three processes where one process drives the other 
two with differential time delays. A pairwise analysis would indicate a causal 
influence from the process that receives an early input to the process that 
receives a late input. To disambiguate these situations requires additional 
measures. Here we define conditional Granger causality which has the ability to 
resolve whether the interaction between two time series is direct or is mediated 
by another recorded time series and whether the causal influence is simply due 
to differential time delays in their respective driving inputs. Our development 
is for three time series. The framework can be generalized to three sets of time 
series [4]. 



3.1 Time Domain Formulation 



Consider three stochastic processes X t , Y t and Z t . Suppose that a pairwise 
analysis reveals a causal influence from Y t to X t . To examine whether this 
influence has a direct component (Figure 1(b)) or is mediated entirely by 
Z t (Figure 1(a)) we carry out the following procedure. First, let the joint 
autoregressive representation of X t and Z t be 

X t = £ asjX^j + £ byZt-j + e 3t , (29) 

3=1 3=1 
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(a) (b) 

Fig. 1. Two distinct patterns of connectivity among three time series. A pairwise 
causality analysis cannot distinguish these two patterns. 

oo oo 

Z t = J2 °3j X t-j + d ^j z t-j + 73t, (30) 



where the covariance matrix of the noise terms is 




Next we consider the joint autoregressive representation of all three processes 



X t , Y t and Z t 










oo 
3=1 


oo 

3=1 


oo 

-j + E C ^3 Z t- 
3=1 


-j + 644, 


(32) 


oo 
3=1 


oo 

■3 + E e ^ Y t- 


oo 

■3 + E# 4 .?^*- 

i=i 




(33) 


oo 

z t = Yl u ±j x t- 


oo 

-3+J2 V *3 Y t- 
3=1 


oo 
3=1 


-j + 74* j 


(34) 



where the covariance matrix of the noise terms is 

11 



/ y y y \ 

^xy '-'xz 



-'yx 



J yy 



J yz 



y y y 

^zx ^zy ^z 



From these two sets of equations we define the Granger causality from Y t to 
X t conditional on Z t to be 



In 



(35) 



The intuitive meaning of this definition is quite clear. When the causal influ- 
ence from Y t to X t is entirely mediated by Z t (Fig. 1(a)), {b^} is uniformly 
zero, and E xx = S 3 . Thus, we have Fy^x\z = 0, meaning that no further 
improvement in the prediction of X t can be expected by including past mea- 
surements of Yt- On the other hand, when there is still a direct component 
from Y t to X t (Fig. 1(b)), the inclusion of past measurements of Y t in addition 
to that of X t and Z t results in better predictions of X t , leading to T, xx < S 3 , 
and F Y ^x\z > 0. 



3.2 Frequency Domain Formulation 



To derive the spectral decomposition of the time domain conditional Granger 
causality we carry out a normalization procedure like that for the bivariate 
case. For Eqs. (29) and (30) the normalized equations are 



D U (L) D 12 {L) 



\D 21 (L) D 22 (L) 









) 




■{*) 









(36) 



where D u (0) = 1,D 22 (0) = 1,D 12 (0) = 0, cov{x* t ,z* t ) = 0, and D 21 (0) is 
generally not zero. We note that var(xj*) = S 3 and this becomes useful in 
what follows. 

For Eqs. (32), (33) and (34) the normalization process involves left-multiplying 
both sides by the matrix 



P = P 2 Pi 



where 



12 



Pi 



and 







1 



1 

1 

— (S Z y — Tj Z xTi x ^Ti X y)(Tjyy — HyxTl^Tlxy) 1 1 



We denote the normalized equations as 



( B U {L) B 12 (L) B 13 (L)\ 



B 21 {L) B 22 {L) B 23 {L) 
B 31 (L) B 32 (L) B 33 (L) 



Vt 
\ Zt J 



K e ztJ 



(37) 



where the noise terms are independent, and their respective variances are E xx , 
Tiyy and 

^zz- 

To proceed further we need the following important relation [4] 



Fy^x\z — Fyz*^x* 



(38) 



and its frequency domain counterpart: 

fY^>X\z(u) = fyZ'^X*^)- 



(39) 



To obtain fyz*->x*(u), we need to decompose the spectrum of X*. The Fourier 
transform of Eqs. (36) and (37) gives: 



f X{u) 
K Z{u) 



G zx (uj) G zz {u) 



(40) 



and 
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Y{u) 
Z(u) 



H yx {uj) 



Hyy(Uj) 



H x Ju) 



H zx (u) H zy (u) H zz (u) 



E x (u) 

Ey(uj) 

E z (cu) 



(41) 



Assuming that X(uo) and Z(uo) from Eq. (40) can be equated with that from 
Eq. (41), we combine Eq. (40) and Eq. (41) to yield, 



1 X*(oj)\ 
Y{u) 
Z*(u) 



G xx {") G xz {u) 

1 

yG zx (u) G zz (u) J y 

Q XX (u) Q X y(uj) Q X z{u) 

Qyx(u) Qyy{u) Q yz (uj) 

K Qzx(u) Q zy (uj) Q zz (u) J y 



\ - 1 / \ / 

H xx (u) H xy (u) H xx (u) 



H yx (u) HyyiyUj) 

H zx (u) H zy (u) 



1 ' EJuA 



H yz (u) 
H zz (uj) 



E y {u) 
E z (co) 



E x (uj)\ 

Ey(uj) 

E z {u) j 
(42) 



where Q(c<j) = G 1 (u)H(u). After suitable ensemble averaging, the spectral 
matrix can be obtained from which the power spectrum of X* is found to be 

S x * x *(u>) = Q xx (uj)t xx Q* xx (uj) + Q X y(Lu)t yy Q* xy (Lu) + Q xz (uj)t zz Q* xz (uj).(A3) 



The first term can be thought of as the intrinsic power and the remaining two 
terms as the combined causal influences from Y and Z*. This interpretation 
leads immediately to the definition 



YZ* 



(u) = In 



Ml 



^XX 



(44) 



We note that S x * x * (cu) is actually the variance of e 3t as pointed out earlier. On 
the basis of the relation in Eq. (39), the final expression for Granger causality 
from Y t to X t conditional on Z t is 



(45) 



f Y ^x\z(v) = In 3 — 

^ xx 



It can be shown that fy^x\z(^) relates to the time domain measure Fy->x\z 
via 



*Y^X\Z 



1 

7^ J f Y ^x\z(uj)du>, 
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under general conditions. 

The above derivation is made possible by the key assumption that X{uj) and 
Z(uo) in Eq. (40) and in Eq. (41) are identical. This certainly holds true on 
purely theoretical grounds, and it may very well be true for simple mathemati- 
cal systems. For actual physical data, however, this condition may be very hard 
to satisfy due to practical estimation errors. In a recent paper we developed a 
partition matrix technique to overcome this problem [6]. The subsequent cal- 
culations of conditional Granger causality are based on this partition matrix 
procedure. 



4 Estimation of Autoregressive Models 

The preceding theoretical development assumes that the time series can be 
well represented by autoregressive processes. Such theoretical autoregressive 
processes have infinite model orders. Here we discuss how to estimate autore- 
gressive models from empirical time series data, with emphasis on the incorpo- 
ration of multiple time series segments into the estimation procedure [7]. This 
consideration is motivated by the goal of applying autoregressive modeling 
in neuroscience. It is typical in behavioral and cognitive neuroscience exper- 
iments for the same event to be repeated on many successive trials. Under 
appropriate conditions, time series data recorded from these repeated trials 
may be viewed as realizations of a common underlying stochastic process. 

Let X t = [Xu, X2t, ■ • • , Xpt] be a p dimensional random process. Here T 
denotes matrix transposition. In multivariate neural data, p represents the 
total number of recording channels. Assume that the process X t is stationary 
and can be described by the following mth order autoregressive equation 



where A(i) are pxp coefficient matrices and E t = [E u , E 2 t, ■ ■ ■ , E pt ] T is a zero 
mean uncorrelated noise vector with covariance matrix S. 

To estimate A(i) and S, we multiply Eq. (46) from the right by X_J_ k , where 
k — 1, 2, • • • , m. Taking expectations, we obtain the Yule- Walker equations 



where R(n) =< X t X^ n > is X t 's covariance matrix of lag n. In deriving 
these equations, we have used the fact that < E t X.J_ k >= as a result of E t 
being an uncorrelated process. 




(46) 




(47) 
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For a single realization of the X process, {xj}^, we compute the covariance 
matrix in Eq. (47) according to 



N-n 

N-n 



Mn) = ^ E (48) 
i=i 



If multiple realizations of the same process are available, then we compute 
the above quantity for each realization, and average across all the realizations 
to obtain the final estimate of the covariance matrix. (Note that for a single 
short trial of data one uses the divisor N for evaluating covariance to reduce 
inconsistency. Due to the availability of multiple trials in neural applications, 
we have used the divisor (N — n) in the above definition Eq. (48) to achieve an 
unbiased estimate.) It is quite clear that, for a single realization, if N is small, 
one will not get good estimates of R(n) and hence will not be able to obtain 
a good model. This problem can be overcome if a large number of realizations 
of the same process is available. In this case the length of each realization can 
be as short as the model order m plus 1. 

Equations (46) contain a total of mp 2 unknown model coefficients. In (47) 
there is exactly the same number of simultaneous linear equations. One can 
simply solve these equations to obtain the model coefficients. An alternative 
approach is to use the Levinson, Wiggins, Robinson (LWR) algorithm, which is 
a more robust solution procedure based on the ideas of maximum entropy. This 
algorithm was implemented in the analysis of neural data described below. The 
noise covariance matrix S may be obtained as part of the LWR algorithm. 
Otherwise one may obtain X through 

m 

S = R(0) +£)A(i)R(i). (49) 
i=i 



Here we note that R T (A;) = R(— k). 

The above estimation procedure can be carried out for any model order m. 
The correct m is usually determined by minimizing the Akaike Information 
Criterion (AIC) defined as 

AIC(m) = 2 log[det(X)] + -J^- (50) 

^total 



where A^otal is the total number of data points from all the trials. Plotted as 
a function of m the proper model order correspond to the minimum of this 
function. It is often the case that for neurobiological data -^V"total * s ver ^ ^ ar S e - 
Consequently, for a reasonable range of m, the AIC function does not achieve 
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a minimum. An alternative criterion is the Bayesian Information Criterion 
(BIC), which is defined as 

BIC(m) = 2 log[det(S)] + ^otai (51) 

^total 



This criterion can compensate for the large number of data points and may 
perform better in neural applications. A final step, necessary for determining 
whether the autoregressive time series model is suited for a given data set, is to 
check whether the residual noise is white. Here the residual noise is obtained 
by computing the difference between the model's predicted values and the 
actually measured values. 

Once an autoregressive model is adequately estimated, it becomes the basis 
for both time domain and spectral domain causality analysis. Specifically, in 
the spectral domain Eq. (46) can be written as 

X(lo) = H(w)E(w) (52) 



where 



HHHEAOOe-^)- 1 (53) 

3=0 



is the transfer function with A(0) being the identity matrix. From Eq. (52), 
after proper ensemble averaging, we obtain the spectral matrix 

S(u;) = H(o;)£H*(cj). (54) 



Once we obtain the transfer function, the noise covariance, and the spectral 
matrix, we can then carry out causality analysis according to the procedures 
outlined in the previous sections. 



5 Numerical Examples 



In this section we consider three examples that illustrate various aspects of 
the general approach outlined earlier. 
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5.1 Example 1 



Consider the following AR(2) model: 
X t = 0.9X t _! - 0.5X,_ 2 + e t 

y t = o.8y t _i-o.5y t _ 2 + o.i6x t _ 1 -o.2x t _ 2 + 7^ 



where e t , r) t are gaussian white noise processes with zero means and variances 
a\ = 1,<7| — 0.7, respectively. The covariance between e t and i] t is 0.4. From 
the construction of the model, we can see that X t has a causal influence on Y t 
and that there is also instantaneous causality between X t and Y t . 

We simulated Eq. (55) to generate a data set of 500 realizations of 100 time 
points each. Assuming no knowledge of Eq. (55) we fitted a MVAR model on 
the generated data set and calculated power, coherence and Granger causality 
spectra. The result is shown in Figure 2. The interdependence spectrum is 
computed according to Eq. (17) and the total causality is defined as the sum 
of directional causalities and the instantaneous causality. The result clearly 
recovers the pattern of connectivity in Eq. (55). It also illustrates that the 
interdependence spectrum, as computed according to Eq. (17), is almost iden- 
tical to the total causality spectrum as defined on the right hand side of Eq. 
(27). 



5.2 Example 2 

Here we consider two models. The first consists of three time series simulating 
the case shown in Figure 1(a), in which the causal influence from Y t to X t is 
indirect and completely mediated by Z t : 

X t = 0.8X t _! - 0.5X t _ 2 + 0AZ t ^ + e t 

Y t = 0.9Y t _i - 0.8Y_ 2 + 6 (56) 
Z t = 0.5Z t _! - 0.2Z t _ 2 + 0.5Y t _! + r] t . 

The second model creates a situation corresponding to Figure 1(b), containing 
both direct and indirect causal influences from Y t to X t . This is achieved by 
using the same system as in Eq. (56), but with an additional term in the first 
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Fig. 2. Simulation results for an AR(2) model consisting of two coupled time series. 
Power (black for X, gray for Y) spectra, interdependence spectrum (related to the 
coherence spectrum), and Granger causality spectra are displayed. Note that the 
total causality spectrum, representing the sum of directional causalities and the 
instantaneous causality, is nearly identical to the interdependence spectrum. 

equation: 

X t = 0.8X t _! - 0.5X t _ 2 + 0.4Z t _! + 0.2Y t _ 2 + e t 

y t = 0.9Y t _ 1 -0.8Y t _ 2 + & (57) 
Z t = 0.5Z t _i - 0.2Z t _ 2 + 0.5Y_x + r) t . 



For both models. e(t),£(t),r)(t) are three independent gaussian white noise 
processes with zero means and variances of a\ = 0.3, a\ — 1, a\ — 0.2, respec- 
tively. 

Each model was simulated to generate a data set of 500 realizations of 100 
time points each. First, pairwise Granger causality analysis was performed on 
the simulated data set of each model. The results are shown in Figure 3(a), 
with the dashed curves showing the results for the first model and the solid 
curves for the second model. From these plots it is clear that pairwise analysis 
cannot differentiate the two coupling schemes. This problem occurs because 
the indirect causal influence from Y t to X t that depends completely on Z t in 
the first model cannot be clearly distinguished from the direct influence from 
Y t to X t in the second model. Next, conditional Granger causality analysis was 
performed on both simulated data sets. The Granger causality spectra from 
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Fig. 3. Simulation results for three coupled time series. Two distinct patterns of 
connectivity as that illustrated in Fig. 1 are considered. Results for the case with a 
direct causal influence are shown as solid curves and the results for the case with 
indirect causal influence are shown as dashed curves, (a) Pairwise Granger causality 
analysis gives very similar results for both cases which indicates that the pairwise 
analysis cannot differentiate these two patterns of connectivity, (b) Conditional 
causality analysis shows a nonzero spectrum (solid) for the direct case and almost 
zero spectrum (dashed) for the indirect case. 



Y t to X t conditional on Z t are shown in Figure 3(b), with the second model's 
result shown as the solid curve and the first model's result as the dashed curve. 
Clearly, the causal influence from Y t to X t that was prominent in the pairwise 
analysis of the first model in Figure 3(a), is no longer present in Figure 3(b). 
Thus, by correctly determining that there is no direct causal influence from Y t 
to X t in the first model, the conditional Granger causality analysis provides 
an unambiguous dissociation of the coupling schemes represented by the two 
models. 
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5.3 Example 3 



We simulated a 5-node oscillatory network structurally connected with differ- 
ent delays. This example has been analyzed with partial directed coherence 
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Fig. 4. Simulation results for a five-node network structurally connected with dif- 
ferent time delays, (a) Schematic illustration of the system, (b) Calculated power 
spectra are shown in the diagonal panels, results of pairwise (solid) and conditional 
Granger causality analysis (dashed) are in off-diagonal panels. Granger causal influ- 
ence is from the horizontal index to the vertical index. Features of Granger causality 
spectra (both pairwise and conditional) are consistent with that of power spectra. 



and directed transfer function methods in [12]. The network involves the fol- 
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lowing multivariate autoregressive model 



X lt = 0.95V2X 1{t _ 1) - 0.9025X 1(t _ 2) + e lt 
X 2 t = 0.5Xi( t _ 2 ) + £2t 

X 3t = -0.4X 1(t _ 3) + e 3t (58) 
X u = -0.5X 1(t _ 2) + 0.25^X4(^1) + 0.25^X5^-1) + e 4t 
X 5t = -0.25^X4(^1) + 0.25^X5(^1) + e 5t , 



where ei t , e 2t , e 3 (, e 4 (, e 5 i are independent gaussian white noise processes with 
zero means and variances of a\ = 0.6, a\ = 0.5, a\ = 0.3, a\ = 0.3, cr^ = 0.6, 
respectively. The structure of the network is illustrated in Figure 4(a). 



We simulated the network model to generate a data set of 500 realizations 
each with 10 time points. Assuming no knowledge of the model, we fitted 
a 5th order MVAR model on the generated data set and performed power 
spectra, coherence and Granger causality analysis on the fitted model. The 
results of power spectra are given in the diagonal panels of Figure 4(b). It 
is clearly seen that all five oscillators have a spectral peak at around 25Hz 
and the fifth has some additional high frequency activity as well. The results 
of pairwise Granger causality spectra are shown in the off-diagonal panels of 
Figure 4(b) (solid curves). Compared to the network diagram in Figure 4(a) 
we can see that pairwise analysis yields connections that can be the result of 
direct causal influences (e.g. 1 — > 2), indirect causal influences (e.g. 1 — > 5) 
and differentially delayed driving inputs (e.g. 2 — > 3). We further performed 
a conditional Granger causality analysis in which the direct causal influence 
between any two nodes are examined while the influences from the other three 
nodes are conditioned out. The results are shown as dashed curves in Figure 
4(b). For many pairs the dashed curves and solid curves coincide (e.g. 1 — > 2), 
indicating that the underlying causal influence is direct. For other pairs the 
dashed curves become zero, indicating that the causal influences in these pairs 
are either indirect are the result of differentially delayed inputs. These results 
demonstrate that conditional Granger causality furnishes a more precise net- 
work connectivity diagram that matches the known structural connectivity. 
One noteworthy feature about Figure 4(b) is that the spectral features (e.g. 
peak frequency) are consistent across both power and Granger causality spec- 
tra. This is important since it allows us to link local dynamics with that of 
the network. 
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6 Analysis of a Beta Oscillation Network in Sensorimotor Cortex 

A number of studies have appeared in the neuroscience literature where the is- 
sue of causal effects in neural data is examined [6,8,9,10,11,12,13,14,15]. Three 
of these studies [9,10,15] used the measures presented in this article. Below we 
review one study published by our group [6,15]. 

Local field potential data were recorded from two macaque monkeys using 
transcortical bipolar electrodes at 15 distributed sites in multiple cortical ar- 
eas of one hemisphere (right hemisphere in monkey GE and left hemisphere 
in monkey LU) while the monkeys performed a GO/NO-GO visual pattern 
discrimination task [16]. The prestimulus stage began when the monkey de- 
pressed a hand lever while monitoring a display screen. This was followed from 
0.5 to 1.25 sec later by the appearance of a visual stimulus (a four-dot pat- 
tern) on the screen. The monkey made a GO response (releasing the lever) or 
a NO-GO response (maintaining lever depression) depending on the stimulus 
category and the session contingency. The entire trial lasted about 500 ms, 
during which the local field potentials were recorded at a sampling rate of 200 
Hz. 

Previous studies have shown that synchronized beta- frequency (15-30 Hz) os- 
cillations in the primary motor cortex are involved in maintaining steady con- 
tractions of contralateral arm and hand muscles. Relatively little is known, 
however, about the role of postcentral cortical areas in motor maintenance 
and their patterns of interaction with motor cortex. Making use of the si- 
multaneous recordings from distributed cortical sites we investigated the in- 
terdependency relations of beta-synchronized neuronal assemblies in pre- and 
postcentral areas in the prestimulus time period. Using power and coherence 
spectral analysis, we first identified a beta-synchronized large-scale network 
linking pre- and postcentral areas. We then used Granger causality spectra 
to measure directional influences among recording sites, ascertaining that the 
dominant causal influences occurred in the same part of the beta frequency 
range as indicated by the power and coherence analysis. The patterns of sig- 
nificant beta-frequency Granger causality are summarized in the schematic 
Granger causality graphs shown in Figure 5. These patterns reveal that, for 
both monkeys, strong Granger causal influences occurred from the primary 
somatosensory cortex (SI) to both the primary motor cortex (Ml) and infe- 
rior posterior parietal cortex (7a and 76), with the latter areas also exerting 
Granger causal influences on the primary motor cortex. Granger causal influ- 
ences from the motor cortex to postcentral areas, however, were not observed 2 . 



2 A more stringent significance threshold was applied here which resulted in elim- 
ination of several very small causal influences that were included in the previous 
report. 
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Our results are the first to demonstrate in awake monkeys that synchronized 
beta oscillations not only bind multiple sensorimotor areas into a large-scale 
network during motor maintenance behavior, but also carry Granger causal 
influences from primary somatosensory and inferior posterior parietal cortices 
to motor cortex. Furthermore, the Granger causality graphs in Figure 5 pro- 
vide a basis for fruitful speculation about the functional role of each cortical 
area in the sensorimotor network. First, steady pressure maintenance is akin 
to a closed loop control problem and as such, sensory feedback is expected 
to provide critical input needed for cortical assessment of the current state 
of the behavior. This notion is consistent with our observation that primary 
somatosensory area (SI) serves as the dominant source of causal influences to 
other areas in the network. Second, posterior parietal area 7b is known to be 
involved in nonvisually guided movement. As a higher-order association area 
it may maintain representations relating to the current goals of the motor 
system. This would imply that area 7b receives sensory updates from area SI 
and outputs correctional signals to the motor cortex (Ml). This conceptu- 
alization is consistent with the causality patterns in Figure 5. As mentioned 
earlier, previous work has identified beta range oscillations in the motor cortex 
as an important neural correlate of pressure maintenance behavior. The main 
contribution of our work is to demonstrate that the beta network exists on a 
much larger scale and that postcentral areas play a key role in organizing the 
dynamics of the cortical network. The latter conclusion is made possible by 
the directional information provided by Granger causality analysis. 




(b) 



Fig. 5. Granger causality graphs for monkey GE (a) and monkey LU (b). 
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Since the above analysis was pairwise, it had the disadvantage of not distin- 
guishing between direct and indirect causal influences. In particular, in monkey 
GE, the possibility existed that the causal influence from area SI to inferior 
posterior parietal area 7a was actually mediated by inferior posterior parietal 
area 7b (Figure 5(a)). We used conditional Granger causality to test the hy- 
pothesis that the SI — > 7a influence was mediated by area 7b. In Figure 6(a) 
is presented the pairwise Granger causality spectrum from SI to 7a (SI — > 7a, 
dark solid curve), showing significant causal influence in the beta frequency 
range. Superimposed in Figure 6(a) is the conditional Granger causality spec- 
trum for the same pair, but with area 7b taken into account (SI — > 7a\7b, light 
solid curve). The corresponding 99% significance thresholds are also presented 
(light and dark dashed lines coincide). These significance thresholds were de- 
termined using a permutation procedure that involved creating 500 permu- 
tations of the local field potential data set by random rearrangement of the 
trial order independently for each channel (site). Since the test was performed 
separately for each frequency, a correction was necessary for the multiple com- 
parisons over the whole range of frequencies. The Bonferroni correction could 
not be employed because these multiple comparisons were not independent. 
An alternative strategy was employed following Blair and Karniski [17]. The 
Granger causality spectrum was computed for each permutation, and then the 
maximum causality value over the frequency range was identified. After 500 
permutation steps, a distribution of maximum causality values was created. 
Choosing a p- value at p — 0.01 for this distribution gave the thresholds shown 
in Figure 6(a),(b) and (c) as dashed lines. 

We see from Figure 6(a) that the conditional Granger causality is greatly re- 
duced in the beta frequency range and no longer significant, meaning that the 
causal influence from SI to 7a is most likely an indirect effect mediated by 7b. 
This conclusion is consistent with the known neuroanatomy of the sensorimo- 
tor cortex [18] in which area 7a receives direct projections from area 7b which 
in turn receives direct projections from the primary somatosensory cortex. No 
pathway is known to project directly from the primary somatosensory cortex 
to area 7a. 

From Figure 5(a) we see that the possibility also existed that the causal influ- 
ence from SI to the primary motor cortex (Ml) in monkey GE was mediated 
by area 7b. To test this possibility, the Granger causality spectrum from SI to 
Ml (SI — > Ml, dark solid curve in Figure 6(b)) was compared with the con- 
ditional Granger causality spectrum with 7b taken into account (SI — > Ml|76, 
light solid curve in Figure 6(b)). In contrast to Figure 6(a), we see that the 
bet a- frequency conditional Granger causality in Figure 6(b) is only partially 
reduced, and remains well above the 99% significance level. From Figure 4(b), 
we see that the same possibility existed in monkey LU of the SI to Ml causal 
influence being mediated by 7b. However, just as in Figure 6(b), we see in Fig- 
ure 6(c) that the beta- frequency conditional Granger causality for monkey LU 
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Fig. 6. Comparison of pairwise and conditional Granger causality spectra for monkey 
GE (a and b), and monkey LU (c). 

is only partially reduced, and remains well above the 99% significance level. 



The results from both monkeys thus indicate that the observed Granger causal 
influence from the primary somatosensory cortex to the primary motor cortex 
was not simply an indirect effect mediated by area 7b. However, we further 
found that area 7b did play a role in mediating the SI to Ml causal influence 
in both monkeys. This was determined by comparing the means of bootstrap 
resampled distributions of the peak beta Granger causality values from the 
spectra of SI — > Ml and SI — > Ml\7b by the Student's t-test. The significant 
reduction of beta-frequency Granger causality when area 7b is taken into ac- 
count (t = 17.2 for GE; t = 18.2 for LU, p «< 0.001 for both), indicates 
that the influence from the primary somatosensory to primary motor area was 
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partially mediated by area 7b. Such an influence is consistent with the known 
neuroanatomy [18] where the primary somatosensory area projects directly to 
both the motor cortex and area 7b, and area 7b projects directly to primary 
motor cortex. 



7 Summary 

In this article we have introduced the mathematical formalism for estimating 
Granger causality in both time and spectral domain from time series data. 
Demonstrations of the technique's utilities are carried out on both simulated 
data, where the patterns of interactions are known, and on local field potential 
recordings from monkeys performing a cognitive task. For the latter we have 
stressed the physiological interpretability of the findings and pointed out the 
new insights afforded by these findings. It is our belief that Granger causality 
offers a new way of looking at cooperative neural computation and it enhances 
our ability to identify key brain structures underlying the organization of a 
given brain function. 
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